---
title: Whisper API Usage Guide
createdAt: Thu Jul 18 2024 06:12:37 GMT+0000 (Coordinated Universal Time)
updatedAt: Thu Jul 18 2024 13:40:04 GMT+0000 (Coordinated Universal Time)
---

# Whisper API Usage Guide

## Introduction

This document will guide developers on how to use the aonweb library to call the Whisper API, Convert speech in audio to text.

## Prerequisites

- Node.js environment
- `aonweb` library installed
- Valid Aonet APPID

## Basic Usage

### 1. Import Required Modules

```js
import { AI, AIOptions } from 'aonweb';
```

### 2. Initialize AI Instance

```js
const ai_options = new AIOptions({
    appId: 'your_app_id_here',
    dev_mode: true
});

const aonweb = new AI(ai_options);
```

### 3. Prepare Input Data Example

```js
const data = {
   input:{
    "audio": "https://replicate.delivery/mgxm/e5159b1b-508a-4be4-b892-e1eb47850bdc/OSR_uk_000_0050_8k.wav",
    "model": "large-v3",
    "translate": false,
    "temperature": 0,
    "transcription": "plain text",
    "suppress_tokens": "-1",
    "logprob_threshold": -1,
    "no_speech_threshold": 0.6,
    "condition_on_previous_text": true,
    "compression_ratio_threshold": 2.4,
    "temperature_increment_on_fallback": 0.2
  }
};
```


```js
const data = {
   input:{
      "audio": "https://replicate.delivery/pbxt/LJr3aqYueyyKOKkIwWWIH67SyvzrAKfCm5tNVYc3uSt7oWy4/4th-dimension-explained-by-a-high-school-student.mp3",
      "model": "large-v3",
      "language": "auto",
      "translate": false,
      "temperature": 0,
      "transcription": "plain text",
      "suppress_tokens": "-1",
      "logprob_threshold": -1,
      "no_speech_threshold": 0.6,
      "condition_on_previous_text": true,
      "compression_ratio_threshold": 2.4,
      "temperature_increment_on_fallback": 0.2
    }
};
```


```js
const data = {
   input:{
      "audio": "https://replicate.delivery/pbxt/LJr3aqYueyyKOKkIwWWIH67SyvzrAKfCm5tNVYc3uSt7oWy4/4th-dimension-explained-by-a-high-school-student.mp3",
      "model": "large-v3",
      "language": "auto",
      "translate": false,
      "temperature": 0,
      "transcription": "plain text",
      "suppress_tokens": "-1",
      "logprob_threshold": -1,
      "no_speech_threshold": 0.6,
      "condition_on_previous_text": true,
      "compression_ratio_threshold": 2.4,
      "temperature_increment_on_fallback": 0.2
    }
};
```


```js
const data = {
   input:{
      "audio": "https://replicate.delivery/pbxt/LJr3aqYueyyKOKkIwWWIH67SyvzrAKfCm5tNVYc3uSt7oWy4/4th-dimension-explained-by-a-high-school-student.mp3",
      "model": "large-v3",
      "language": "auto",
      "translate": false,
      "temperature": 0,
      "transcription": "plain text",
      "suppress_tokens": "-1",
      "logprob_threshold": -1,
      "no_speech_threshold": 0.6,
      "condition_on_previous_text": true,
      "compression_ratio_threshold": 2.4,
      "temperature_increment_on_fallback": 0.2
    }
};
```


### 4. Call the AI Model

```js
const price = 8; // Cost of the AI call
try {
    const response = await aonweb.prediction("/predictions/ai/whisper@soykertje", data, price);
    // Handle response
    console.log("Whisper result:", response);
} catch (error) {
    // Error handling
    console.error("Error generating :", error);
}
```

### Parameter Description

- `audio` String,Provide the audio file that needs optimization
- `model` String,Whisper model size (currently only large-v3 is supported).
- `translate` Boolean,Translate the text to English when set to True
- `patience` Number,optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
- `temperature` Number,temperature to use for sampling
- `transcription` String,Choose the format for the transcription
- `suppress_tokens` String,comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
- `logprob_threshold`  Number,if the average log probability is lower than this value, treat the decoding as failed
- `no_speech_threshold` Number,if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
- `condition_on_previous_text` Boolean,if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
- `compression_ratio_threshold` Number,if the gzip compression ratio is higher than this value, treat the decoding as failed
- `temperature_increment_on_fallback` Number,temperature to increase when falling back when the decoding fails to meet either of the thresholds below

### Notes

- Ensure that the provided audio URL is publicly accessible and of good quality to achieve the best cloning effect.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states.
- Handle possible errors, such as network issues, invalid input, or API limitations.
- Adhere to the terms of use and privacy regulations, especially when handling voice samples of others.

### Example Response

The API response will contain the URL of the generated cloned voice or other relevant information. Parse and use the response data according to the actual API documentation.
